MySQL to NoSQL

For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.

Flights

We begin by loading the flights database into R from MySQL by creating a connection and downloading the tables.

#knit with parameters to be prompted for a MySQL password

#load libraries
library(DBI)
library(RMySQL)
library(mongolite)
library(knitr)

#create a connection to the flights database in MySQL
flights_db = dbConnect(MySQL(), user='root', password= params$dbPassword, dbname='flights', host='localhost')
dbListTables(flights_db)
## [1] "airlines" "airports" "flights"  "planes"   "weather"
#load the flights tables
flights<-dbGetQuery(flights_db,"SELECT * FROM flights;")
airlines<-dbGetQuery(flights_db,"SELECT * FROM airlines;")
airports<-dbGetQuery(flights_db,"SELECT * FROM airports;")
planes<-dbGetQuery(flights_db,"SELECT * FROM planes;")
weather<-dbGetQuery(flights_db,"SELECT * FROM weather;")

#end connection to MySQL
dbDisconnect(flights_db)
## [1] TRUE

Next we load the tables into data frames in R.

#put flights tables into data frames
myflights <- data.frame(flights)
myairlines <- data.frame(airlines)
myairports <- data.frame(airports)
myplanes <- data.frame(planes)
myweather <- data.frame(weather)

#print the first few rows of the weather data frame
kable(head(myweather))
origin year month day hour temp dewp humid wind_dir wind_speed wind_gust precip pressure visib
EWR 2013 1 1 0 37.04 21.92 53.97 230 10.35702 11.91865 0 1013.9 10
EWR 2013 1 1 1 37.04 21.92 53.97 230 13.80936 15.89154 0 1013.0 10
EWR 2013 1 1 2 37.94 21.92 52.09 230 12.65858 14.56724 0 1012.6 10
EWR 2013 1 1 3 37.94 23.00 54.51 230 13.80936 15.89154 0 1012.7 10
EWR 2013 1 1 4 37.94 24.08 57.04 240 14.96014 17.21583 0 1012.8 10
EWR 2013 1 1 6 39.02 26.06 59.37 270 10.35702 11.91865 0 1012.0 10

Now we can load these data frames into a MongoDB database using the mongolite package in R.

#create a connection to MongoDB for the myflights data frame
mongodb <- mongo(collection = "myflights", db = "flights_db", url = "mongodb://localhost",
      verbose = FALSE, options = ssl_options())

#insert the myflights data frame 
mongodb$insert(myflights)
## List of 5
##  $ nInserted  : num 336776
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()
#create a connection to MongoDB for the myairlines data frame
mongodb <- mongo(collection = "myairlines", db = "flights_db", url = "mongodb://localhost",
                 verbose = FALSE, options = ssl_options())

#insert the myairlines data frame
mongodb$insert(myairlines)
## List of 5
##  $ nInserted  : num 16
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()
#create a connection to MongoDB for the myairports data frame
mongodb <- mongo(collection = "myairports", db = "flights_db", url = "mongodb://localhost",
                 verbose = FALSE, options = ssl_options())

#insert the myairports data frame
mongodb$insert(myairports)
## List of 5
##  $ nInserted  : num 1397
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()
#create a connection to MongoDB for the myplanes data frame
mongodb <- mongo(collection = "myplanes", db = "flights_db", url = "mongodb://localhost",
                 verbose = FALSE, options = ssl_options())

#insert the myplanes data frame
mongodb$insert(myplanes)
## List of 5
##  $ nInserted  : num 3322
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()
#create a connection to MongoDB for the myweather data frame
mongodb <- mongo(collection = "myweather", db = "flights_db", url = "mongodb://localhost",
                 verbose = FALSE, options = ssl_options())

#insert the myweather data frame
mongodb$insert(myweather)
## List of 5
##  $ nInserted  : num 8719
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

And finally we query the mongodb data base to show that the tables where loaded the MondoDB.

#query the flights mongodb to show that the flights table was inserted
mongoflights <- mongodb$find(limit = 5)
mongoflights
##   origin year month day hour  temp  dewp humid wind_dir wind_speed
## 1    EWR 2013     1   1    0 37.04 21.92 53.97      230   10.35702
## 2    EWR 2013     1   1    1 37.04 21.92 53.97      230   13.80936
## 3    EWR 2013     1   1    2 37.94 21.92 52.09      230   12.65858
## 4    EWR 2013     1   1    3 37.94 23.00 54.51      230   13.80936
## 5    EWR 2013     1   1    4 37.94 24.08 57.04      240   14.96014
##   wind_gust precip pressure visib
## 1  11.91865      0   1013.9    10
## 2  15.89154      0   1013.0    10
## 3  14.56724      0   1012.6    10
## 4  15.89154      0   1012.7    10
## 5  17.21583      0   1012.8    10

Compare relational databases with NoSQL databases

A relational database like MySQL uses tables which can be linked and queried using a structured query language. SQL databases are vertically scalable which allows for easliy increasing storage capacity. Some benefits of relational databases are maturity, compatibility, replicability, and cost effectivenes.

NoSQL databases are not table based, but rather store the data in column-oriented, document-oriented, graph-based or organized as a KeyValue store and therefore are more flexible in their format. The language used to query is a dynamic schema for unstructured data. A NoSQL database like MongoDB for example stores data in json form. NoSQL databases are horizontally scalable which allows for more traffic. Some benefits are scalability, manageablity, speed, and flexibility.