noSQL Migration

I will be using the famous Microsoft database Northwind as my relational database to show my case for migrating MySQL rational database to noSQL database, which in my case will be Neo4j.

So, we will start with this schema stricted database where the entities are related:

to a noSQL database, in my case it’s graph database with nodes and their relations.

I started by downloading the sql dump for the NorthWind database found on the internet, and then run it locally on MySQL to generate the schema and create the database NorthWind in MySQL.

The sql script I used is stored in my github repository found here: https://github.com/theoracley/Data607/blob/master/NoSQLMigration/Northwind.MySQL5.sql

This is my NorthWind database created locally in MySQL:

Export the rational database tables into csv files

While there is an ETL tool in Neo4j that can import a rational database into Neo4j, I will not use it. Rather, I prefer to export the tables to CSV files and then import them in Neo4j using the Load_CSV Neo4j command, so I can understand how everything works.

In MySQL, by right clicking on each table and choose to export the table to csv, I was able to create few CSVs. In this case, I choose to export the following tables only: Categories, Customers, Order Details, Orders, Products and Suppliers.

Enter Neo4j

I loaded Neo4j (Community Edition) and created a Graph (aka Database). I then needed to upload my CSV files to the import directory of the Neo4j installation, so I don’t have to mention the whole path of the csv file in the cypher command.

The following cypher commands were entered in Neo4j to create the NorthWind Graph database:

// Create Constraints
CREATE CONSTRAINT ON (p:Product) ASSERT p.productID IS UNIQUE;
CREATE CONSTRAINT ON (c:Category) ASSERT c.categoryID IS UNIQUE;
CREATE CONSTRAINT ON (s:Supplier) ASSERT s.supplierID IS UNIQUE;
CREATE CONSTRAINT ON (c:Customer) ASSERT c.customerID IS UNIQUE;
CREATE CONSTRAINT ON (o:Order) ASSERT o.orderID IS UNIQUE;

// Create Indexes
CREATE INDEX ON :Product(productName);
CREATE INDEX ON :Category(categoryName);
CREATE INDEX ON :Supplier(companyName);
CREATE INDEX ON :Supplier(contactName);
CREATE INDEX ON :Supplier(city);
CREATE INDEX ON :Supplier(country);
CREATE INDEX ON :Supplier(postalCode);
CREATE INDEX ON :Customer(companyName);
CREATE INDEX ON :Customer(contactName);
CREATE INDEX ON :Customer(city);
CREATE INDEX ON :Customer(country);
CREATE INDEX ON :Customer(postalCode);
CREATE INDEX ON :Order(customerID);
CREATE INDEX ON :Order(shipName);
CREATE INDEX ON :Order(shipCity);
CREATE INDEX ON :Order(shipCountry);
CREATE INDEX ON :Order(shipPostalCode);

// Load Products
LOAD CSV WITH HEADERS FROM “file:///Products.csv” AS row
CREATE (n:Product)
SET n = row,
n.unitPrice = toFloat(row.unitPrice),
n.unitsInStock = toInt(row.unitsInStock), n.unitsOnOrder = toInt(row.unitsOnOrder),
n.reorderLevel = toInt(row.reorderLevel), n.discontinued = (toInt(row.discontinued) <> 0);

// Load Categories
LOAD CSV WITH HEADERS FROM “file:///Categories.csv” AS row
CREATE (n:Category)
SET n = row;

// Load Suppliers
LOAD CSV WITH HEADERS FROM “file:///Suppliers.csv” AS row
CREATE (n:Supplier)
SET n = row;

// Create Product to Category Relationships
MATCH (p:Product),(c:Category)
WHERE p.categoryID = c.categoryID
CREATE (p)-[:PART_OF]->(c);

// Create Product to Supplier Relationships
MATCH (p:Product),(s:Supplier)
WHERE p.supplierID = s.supplierID
CREATE (s)-[:SUPPLIES]->(p);

// Load Customers
LOAD CSV WITH HEADERS FROM “file:///Customers.csv” AS row
CREATE (n:Customer)
SET n = row;

// Load Orders
LOAD CSV WITH HEADERS FROM “file:///Orders.csv” AS row
CREATE (n:Order)
SET n = row;

// Create Customer to Order Relationships
MATCH (c:Customer),(o:Order)
WHERE c.customerID = o.customerI
D CREATE (c)-[:PURCHASED]->(o);

// Load Order Details
LOAD CSV WITH HEADERS FROM “file:///OrderDetails.csv” AS row
MATCH (p:Product), (o:Order)
WHERE p.productID = row.productID AND o.orderID = row.orderID
CREATE (o)-[details:ORDERS]->(p)
SET details = row,
details.quantity = toInt(row.quantity);

Querying the NorthWind Graph with Cypher

we now ask the Neo4j NorthWind graph database to return the nodes it has.Notice how different types of nodes are colored differently to reflet each node type. We use the Cypher query language to query the graph database.

Using R to interact with Neo4j

In our case, we use Neo4r driver to allow R to interact with Neo4j to query the database. let’ see how it works.

#install.packages("neo4r")
library(neo4r) #R packgae for Neo4j
# library(igraph)
#library(dplyr)

#create a connection to my local Neo4j database
con <- neo4j_api$new(
  url = "http://localhost:7474", 
  user = "neo4j", 
  password = "Malek@1234"
)

# Test the endpoint with the ping, and investigate the response status. 200 = connection is up
con$ping()

## [1] 200

# return all nodes of type Category
res <- 'MATCH (n:Category) RETURN n' %>%
  call_neo4j(con, type = "graph") # %>%
  # extract_relationships() %>%
  # unnest_relationships()

unnest_nodes(res$nodes)

## # A tibble: 8 x 6
##   id    value   CategoryID Description                 Picture CategoryName
##   <chr> <chr>   <chr>      <chr>                       <chr>   <chr>       
## 1 80    Catego~ 1          Soft drinks, coffees, teas~ <U+FFFD><U+FFFD><U+FFFD><U+FFFD>    Beverages   
## 2 81    Catego~ 2          Sweet and savory sauces, r~ <U+FFFD><U+FFFD><U+FFFD><U+FFFD>    Condiments  
## 3 82    Catego~ 3          Desserts, candies, and swe~ <U+FFFD><U+FFFD><U+FFFD><U+FFFD>    Confections 
## 4 83    Catego~ 4          Cheeses                     <U+FFFD><U+FFFD><U+FFFD><U+FFFD>    Dairy Produ~
## 5 84    Catego~ 5          Breads, crackers, pasta, a~ <U+FFFD><U+FFFD><U+FFFD><U+FFFD>    Grains/Cere~
## 6 85    Catego~ 6          Prepared meats              <U+FFFD><U+FFFD><U+FFFD><U+FFFD>    Meat/Poultry
## 7 86    Catego~ 7          Dried fruit and bean curd   <U+FFFD><U+FFFD><U+FFFD><U+FFFD>    Produce     
## 8 87    Catego~ 8          Seaweed and fish            <U+FFFD><U+FFFD><U+FFFD><U+FFFD>    Seafood

Conclusion

There you have it. I went from knowing nothing about Neo4j, to become more familiar with it and understanding and almost mastering graph databases and the Cypher query language. This is why I did note choose another noSQL DB like MongoDB or something else. I wanted to really understand Graph databases and through this assignment I achieve that.

So we use noSQL database instead of rational databases to stay free from the restriction of the schema in the rational DBs. Rational Databases require lot of computing for executing queries due to relational references. Rational databases cannot or impossible to scale horisontally. Vertical scaling is also limited. With noSQL databases, no schema is enforced. Instead of rows, they store documents and documents can be with different structures. No schema is enforced. No schema implies no relations. Relations are not used heavily if needed. Parents can have childs within it, so does not need to reference child records outside it, which make any referencial queries very fast.