For Project 4, you should take information from a relational database and migrate it to a NoSQL database of your own choosing. For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.For the NoSQL database, you may use MongoDB, Neo4j (which we introduce in Week 12), or another NoSQL database of your choosing. Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
library(devtools)
library(plyr)
library(rjson)
library(rmongodb)
library(RMySQL)
db_connection <- dbConnect(drv = dbDriver("MySQL"), user = 'root', password = "Secure123", host = 'localhost', dbname='test_flights')
airlines_sql <- dbReadTable(db_connection, "airlines")
airports_sql <- dbReadTable(db_connection, "airports")
flights_sql <- dbReadTable(db_connection, "flights")
planes_sql <- dbReadTable(db_connection, "planes")
weather_sql <- dbReadTable(db_connection, "weather")
dbDisconnect(db_connection)
write.table(airlines_sql, file = "~/MSDA/DATA 607/Data/airlines.csv", sep = ",", row.names = FALSE)
write.table(airports_sql, file = "~/MSDA/DATA 607/Data/airports.csv", sep = ",", row.names = FALSE)
write.table(flights_sql, file = "~/MSDA/DATA 607/Data/flights.csv", sep = ",", row.names = FALSE)
write.table(planes_sql, file = "~/MSDA/DATA 607/Data/planes.csv", sep = ",", row.names = FALSE)
write.table(weather_sql, file = "~/MSDA/DATA 607/Data/weather.csv", sep = ",", row.names = FALSE)
library(rmongodb)
mongo = mongo.create(host = "localhost")
mongo.is.connected(mongo)
Time to import the data into mongo db, I used this as a reference to import csv files into mongodb. https://www.mkyong.com/mongodb/mongodb-import-and-export-example/
mongoimport -db flights -collection airlines --type csv --file "~/MSDA/DATA 607/Data/airlines.csv" --headerline
We can use the statement like above to import all the data from other table as a collection in mongoDB. For simplicity I only imported airlines table into R but we can use to to import all the data from the flights dataset.
Now we will compare the results of the mongoDB with the mySQL by quering MongoDb and storing the objects in R.
library(rmongodb)
mongo = mongo.create(host = "localhost")
mongo.is.connected(mongo)
airlines_mongo <- mongo.cursor.to.data.frame(mongo.find(mongo, "flights.airlines", '{}'), stringsAsFactors = FALSE)
We can see that the number of records for the airlines tables is same for both the data frames:
summary(airlines_mongo)
summary(airlines_sql)
dim(airlines_mongo)
dim(airlines_sql)
Relational database (RDBMS) like SQL has been the primary model for database management during the past few decades. But today, non-relational, “NoSQL” databases are gaining prominence as an alternative model for database management.
Scaling: In traditional relational DB world scaling always relied on scaling up or buying bigger, expensive, multiple servers as database load increased rather than scaling out or distributing the database across multiple hosts. In NoSQL db the nodes can be added or removed and scaled seamlessly without application downtime.
Low Maintenance for NoSQL and low cost servers: Relational DB needs a lot of maintenance and requires expertise in DB administration, But with the NoSQL cluster its easier to maintain and requires less management with simpler data models
Schema less data model: In NoSQL Data can be inserted in database without first defining a schema. So the format or data model being inserted can be changed any time, without application disruption. This provides immense application and business flexibility. Whereas in relational DB its not possible without affecting the application layer.
Very Nascent stage: NoSQL is still in very nascent stage and not many robust solutions like relational db are available.
Expertise: Since relational db’s been around forever, there are alot of expertise available in the market and still have a vast reach in the enterprise world.