Tasks required

For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.

For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.

For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing.

Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.

Connecting to flights database through RMysql

In week 01 we were taught how to load flights database into MySQL workbench. I will be working on that database for this assignment.

Load the library

library(RMySQL)
## Loading required package: DBI

Connect to MySQL through R

mydb = dbConnect(MySQL(), user='root', password='Lexicon4321', host='127.0.0.1')

Use the flights Schema

dbSendQuery(mydb,'USE flights;')
## <MySQLResult:0,0,0>

Check for the number of tables in the database. We see that there are five tables under the names airlines, airports, flights, planes, weather.

dbListTables(mydb)
## [1] "airlines" "airports" "flights"  "planes"   "weather"

Load the five tables into R

airlines <- dbGetQuery(mydb, "SELECT * FROM airlines")
airports <- dbGetQuery(mydb, "SELECT * FROM airports")
flightdetails <- dbGetQuery(mydb, "SELECT * FROM flights")
planes <- dbGetQuery(mydb, "SELECT * FROM planes")
weather <- dbGetQuery(mydb, "SELECT * FROM weather")

Check if the tables are loaded

head(airports)
##   faa                           name      lat       lon  alt tz dst
## 1 04G              Lansdowne Airport 41.13047 -80.61958 1044 -5   A
## 2 06A  Moton Field Municipal Airport 32.46057 -85.68003  264 -5   A
## 3 06C            Schaumburg Regional 41.98934 -88.10124  801 -6   A
## 4 06N                Randall Airport 41.43191 -74.39156  523 -5   A
## 5 09J          Jekyll Island Airport 31.07447 -81.42778   11 -4   A
## 6 0A9 Elizabethton Municipal Airport 36.37122 -82.17342 1593 -4   A

Close connection to MySQL

#dbDisconnect(mydb)

Loading the data into MongoDB

Loading the library

library(mongolite)

Create collection called Airlines and load airlines table into the collection

flights <- mongo(collection = "Airlines", db = "flights")
flights$insert(airlines)

Reset the collection to Airports to load airports table into the collection

flights <- mongo(collection = "Airports", db = "flights")
flights$insert(airports)

Let us do the same for the rest of the three tables

flights <- mongo(collection = "Flightdetails", db = "flights")
flights$insert(flightdetails)
flights <- mongo(collection = "Planes", db = "flights")
flights$insert(planes)
flights <- mongo(collection = "Weather", db = "flights")
flights$insert(weather)

Let us check if the tables are loaded into mongodb, we see a screen like below:

knitr::include_graphics('mongodb.png')

I would now like to check if the number of records are the same for the tables. I will compare my last mongodb table inserted (weather) with my R dataframe called weather

flights$count() == nrow(weather)

Relational vs Non relational

Reasons to use a Relational database

1.You need ACID compliancy (Atomicity, Consistency, Isolation, Durability). ACID compliancy reduces anomalies and protects the integrity of your database. It does this by defining exactly how transactions interact with the database, which is not the case with NoSQL databases, which have a primary goal of flexibility and speed, rather than 100% data integrity.

2.Your data is structured and unchanging. If your business is not growing exponentially, there may be no reason to use a system designed to support a variety of data types and high traffic volume.

Reasons to use a non relational database

To prevent the database from becoming a system-wide bottleneck, especially in high volume environments, NoSQL databases perform in a way that relational databases cannot.

The following features are driving the popularity of NoSQL databases like MongoDB, CouchDB, Cassandra, and HBase:

  1. Storing large volumes of data without structure. A NoSQL database doesn’t limit storable data types. Plus, you can add new types as business needs change.

  2. Using cloud computing and storage. Cloud-based storage is a great solution, but it requires data to be easily spread across multiple servers for scaling. Using affordable hardware on-site for testing and then for production in the cloud is what NoSQL databases are designed for.

  3. Rapid development. If you are developing using modern agile methodologies, a relational database will slow you down. A NoSQL database doesn’t require the level of preparation typically needed for relational databases.

Source: https://blog.panoply.io/sql-or-nosql-that-is-the-question