data607 wk 12

library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=60,tidy=TRUE))

I am having issues connecing to MySQL, and, as a workaround, I will be importing csv files directly. To be able to do that, flights folder with all the entities data has to be saved under the default import location in Neo4j folder because the directory is predefined. However, to make the code reproducible, one can adjust neo4j.conf file by commenting out the line “dbms.directories.import=import” which should allow to import files from any location (includign remote URLs). Note, the below only imports data without creating nodes or relationships yet.

#con <- dbConnect(MySQL(), user='root', password='', host='localhost')
#flights <- dbGetQuery(con, "SELECT *  from flights)

library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")
                # username = "neo4j", password = "")
# if saving folder locally, then can use file:///flights/airlines.csv (change file name accordingly)

airlines <- "LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/agCS/DATA607/master/airlines.csv' AS row
CREATE (n:airlines {
  carrier: row.carrier,
  name: row.name})"
cypher(graph, airlines)

airports <- "LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/agCS/DATA607/master/airports.csv' AS row
CREATE (n:airports {
  faa: row.faa,
  name: row.name, 
  lat: row.lat,
  lon: row.lon,
  alt: row.alt,
  tz: row.tz,
  dst: row.dst})"
cypher(graph, airports)

weather <- "LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/agCS/DATA607/master/weather.csv' AS row
CREATE (n:weather {
  origin: row.origin,
  year: row.year,
  month: row.month,
  day: row.day,
  hour: row.hour,
  temp: row.temp,
  dewp: row.dewp,
  humid: row.humid, 
  wind_speed: row.wind_speed,
  wind_gust: row.wind_gust,
  precip: row.precip,
  pressure: row.pressure,
  visib: row.visib})" 
cypher(graph, weather)

planes <- "LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/agCS/DATA607/master/planes.csv' AS row
CREATE (n:planes {
  tailnum: row.tailnum,     
  year: row.year,   
  type: row.type,   
  manufacturer: row.manufacturer,   
  model: row.model,
  engines: row.engines, 
  seats: row.seats, 
  speed: row.speed, 
  engine: row.engine })"
cypher(graph, planes)

flights <- "USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/agCS/DATA607/master/flights.csv' AS row 
CREATE (n:flights {
  year: row.year,
  month: row.month,
  day: row.days,
  dep_time: row.dep_time,
  dep_delay: row.dep_delay,
  arr_time: row.arr_time,
  arr_delay: row.arr_delay, 
  carrier: row.carrier,
  tailnum: row.tailnum,
  flight: row.flight,
  origin: row.origin,
  dest: row.dest,
  air_time: row.air_time,
  distance: row.distance,
  hour: row.hour,
  minute: row.minute })"
cypher(graph, flights)

Advantages and disadvantages of storing the data in a relational database vs. NoSQL database (in no particular order):
1. NoSQL databases allow to store different types of data together.
2. Because of easier scalability, NoSQL allow for effective cloud-based storage and rapid processing of real-time big data in affordable ways.
3. There is no need to develop a detailed database model with NoSQL allowing for a quicker development and testing. This, however, could be a disadvantage as well, as often not enough thought put into the graph modelling hindering the process later.
4. SQL have established community with more and better resources available.
5. SQL has standardized query language, NoSQL lacks in it currently.
6. NoSQL data don’t have to be normalized, as opposed to SQL.
To sum up, there is no better or worse when comparing SQL vs. NoSQL, rather each is relevant and suitable for specific tasks and should be chosen appropriately.

data607 wk 12

Albina Gallyavova

11/17/2017