library(DBI)
library(RMySQL)
library(tidyverse)
library(knitr)
library(nycflights13)
dim(nycflights13::airlines)
## [1] 16 2
dim(nycflights13::airports)
## [1] 1458 8
head(nycflights13::airports)
## # A tibble: 6 x 8
## faa name lat lon alt tz dst tzone
## <chr> <chr> <dbl> <dbl> <int> <dbl> <chr> <chr>
## 1 04G Lansdowne Airport 41.1 -80.6 1044 -5. A Amer~
## 2 06A Moton Field Municipal Airport 32.5 -85.7 264 -6. A Amer~
## 3 06C Schaumburg Regional 42.0 -88.1 801 -6. A Amer~
## 4 06N Randall Airport 41.4 -74.4 523 -5. A Amer~
## 5 09J Jekyll Island Airport 31.1 -81.4 11 -5. A Amer~
## 6 0A9 Elizabethton Municipal Airport 36.4 -82.2 1593 -5. A Amer~
Once the connection is established, view the tables.
mySQLdb =dbConnect(MySQL(),user='DATA607User',password=mypassword,dbname='CUNY_DATA607',host='localhost')
dbListTables(mySQLdb)
## [1] "airlines" "airports" "movie" "rating" "reviewer"
dbRemoveTable(mySQLdb, name='airlines')
## [1] TRUE
dbWriteTable(mySQLdb, name='airlines', value=nycflights13::airlines)
## [1] TRUE
dbRemoveTable(mySQLdb, name='airports')
## [1] TRUE
dbWriteTable(mySQLdb, name='airports', value=nycflights13::airports)
## [1] TRUE
dbListTables(mySQLdb)
## [1] "airlines" "airports" "movie" "rating" "reviewer"
Fetching the records frm airlines table
airlines_rs = dbSendQuery(mySQLdb, 'select * from airlines')
airlines.df <- fetch(airlines_rs)
kable(head(airlines.df))
row_names | carrier | name |
---|---|---|
1 | 9E | Endeavor Air Inc. |
2 | AA | American Airlines Inc. |
3 | AS | Alaska Airlines Inc. |
4 | B6 | JetBlue Airways |
5 | DL | Delta Air Lines Inc. |
6 | EV | ExpressJet Airlines Inc. |
airports_rs = dbSendQuery(mySQLdb, 'select * from airports')
airports.df <- fetch(airports_rs,n = -1)
kable(head(airports.df))
row_names | faa | name | lat | lon | alt | tz | dst | tzone |
---|---|---|---|---|---|---|---|---|
1 | 04G | Lansdowne Airport | 41.13047 | -80.61958 | 1044 | -5 | A | America/New_York |
2 | 06A | Moton Field Municipal Airport | 32.46057 | -85.68003 | 264 | -6 | A | America/Chicago |
3 | 06C | Schaumburg Regional | 41.98934 | -88.10124 | 801 | -6 | A | America/Chicago |
4 | 06N | Randall Airport | 41.43191 | -74.39156 | 523 | -5 | A | America/New_York |
5 | 09J | Jekyll Island Airport | 31.07447 | -81.42778 | 11 | -5 | A | America/New_York |
6 | 0A9 | Elizabethton Municipal Airport | 36.37122 | -82.17342 | 1593 | -5 | A | America/New_York |
nrow(airports.df)
## [1] 1458
Created a developer account on https://mlab.com and also created a database mdbdata607
library(mongolite)
## Warning: package 'mongolite' was built under R version 3.4.4
Mongodb is a document oriented NoSQL database which stores values in key-value pair (json objects) and stored in their respective collection (collection is similar to Table of RDBMS)
#mongo db user name and password hidden in RMarkdown
#mdbUrl <- 'mongodb://<username>:<password>@ds041603.mlab.com:41603/mdbdata607'
airlineCollection <- mongo(collection = 'airlines',url = mdbUrl )
airlineCollection$drop()
airlineCollection$insert(airlines.df)
## List of 5
## $ nInserted : num 16
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
airportCollection <- mongo(collection = 'airports',url = mdbUrl )
airportCollection$drop()
airportCollection$insert(airports.df)
## List of 5
## $ nInserted : num 1458
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
kable(head(airportCollection$find()))
row_names | faa | name | lat | lon | alt | tz | dst | tzone |
---|---|---|---|---|---|---|---|---|
1 | 04G | Lansdowne Airport | 41.13047 | -80.61958 | 1044 | -5 | A | America/New_York |
2 | 06A | Moton Field Municipal Airport | 32.46057 | -85.68003 | 264 | -6 | A | America/Chicago |
3 | 06C | Schaumburg Regional | 41.98934 | -88.10124 | 801 | -6 | A | America/Chicago |
4 | 06N | Randall Airport | 41.43191 | -74.39156 | 523 | -5 | A | America/New_York |
5 | 09J | Jekyll Island Airport | 31.07447 | -81.42778 | 11 | -5 | A | America/New_York |
6 | 0A9 | Elizabethton Municipal Airport | 36.37122 | -82.17342 | 1593 | -5 | A | America/New_York |
Comparing MySQL and Mongo DB dataframes (TRUE indicates successful Migration)
mdb.airlines.df <- airlineCollection$find()
mdb.airlines.df== airlines.df
## row_names carrier name
## 1 TRUE TRUE TRUE
## 2 TRUE TRUE TRUE
## 3 TRUE TRUE TRUE
## 4 TRUE TRUE TRUE
## 5 TRUE TRUE TRUE
## 6 TRUE TRUE TRUE
## 7 TRUE TRUE TRUE
## 8 TRUE TRUE TRUE
## 9 TRUE TRUE TRUE
## 10 TRUE TRUE TRUE
## 11 TRUE TRUE TRUE
## 12 TRUE TRUE TRUE
## 13 TRUE TRUE TRUE
## 14 TRUE TRUE TRUE
## 15 TRUE TRUE TRUE
## 16 TRUE TRUE TRUE
MongoDb is a Document database which stores information in pair (key-value). Each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents. It is very useful if the use case document centric.
SQL databases on the other hand are most commonly used. It is very efficient for storing structured data and where organizing data in realtional manner is recommended.
SQL databases are by design ACID (Atomicity Consistency Isolation Durability) compliant whereas NoSQL databases are not. So, selection of database is really driven by the type of use case and whether the system needs ACID capability or not.