Data 607 Week-12 Assignment

Ask

For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing. For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation. For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing. Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.

For this assignment, I have decided to use the MySQL as my RDBMS, and MongoDB as my NoSQL database. Also, the flights database used earlier in Data 607 assignments has been used here.

The first step is to load the MySQL database with the Flights database. The sql script to load the Flights database is present here: https://raw.githubusercontent.com/deepakmongia/Fall2018/master/Data-607/flights.sql The above script needs to be run before the below.

Step-1 : Loading the required R packages:

library(mongolite)
library(RMySQL)

## Loading required package: DBI

Step-2 : Connecting the MySQL database:

mydb = dbConnect(MySQL(), user='root', password='Deepak1234#', dbname='flights', host='localhost')

summary(mydb)

## <MySQLConnection:0,0>
##   User:   root 
##   Host:   localhost 
##   Dbname: flights 
##   Connection type: localhost via TCP/IP 
## 
## Results:

Now assuming that the Flights database has been loaded on to MySQL, we will use R to pull the data into R from MySQL.

Step-3 : Pulling data from MySQL

airlines.df <- dbGetQuery(mydb, "
                          select * from airlines")
head(airlines.df)

##   carrier                       name
## 1      9E        Endeavor Air Inc.\r
## 2      AA   American Airlines Inc.\r
## 3      AS     Alaska Airlines Inc.\r
## 4      B6          JetBlue Airways\r
## 5      DL     Delta Air Lines Inc.\r
## 6      EV ExpressJet Airlines Inc.\r

airports.df <- dbGetQuery(mydb, "
                          select * from airports")
head(airports.df)

##   faa                           name      lat       lon  alt tz dst
## 1 04G              Lansdowne Airport 41.13047 -80.61958 1044 -5   A
## 2 06A  Moton Field Municipal Airport 32.46057 -85.68003  264 -5   A
## 3 06C            Schaumburg Regional 41.98934 -88.10124  801 -6   A
## 4 06N                Randall Airport 41.43191 -74.39156  523 -5   A
## 5 09J          Jekyll Island Airport 31.07447 -81.42778   11 -4   A
## 6 0A9 Elizabethton Municipal Airport 36.37122 -82.17342 1593 -4   A

flights.df <- dbGetQuery(mydb, "
                          select * from flights")
head(flights.df)

##   year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013     1   1      517         2      830        11      UA  N14228
## 2 2013     1   1      533         4      850        20      UA  N24211
## 3 2013     1   1      542         2      923        33      AA  N619AA
## 4 2013     1   1      544        -1     1004       -18      B6  N804JB
## 5 2013     1   1      554        -6      812       -25      DL  N668DN
## 6 2013     1   1      554        -4      740        12      UA  N39463
##   flight origin dest air_time distance hour minute
## 1   1545    EWR  IAH      227     1400    5     17
## 2   1714    LGA  IAH      227     1416    5     33
## 3   1141    JFK  MIA      160     1089    5     42
## 4    725    JFK  BQN      183     1576    5     44
## 5    461    LGA  ATL      116      762    6     54
## 6   1696    EWR  ORD      150      719    6     54

planes.df <- dbGetQuery(mydb, "
                          select * from planes")
head(planes.df)

##   tailnum year                    type     manufacturer     model engines
## 1  N10156 2004 Fixed wing multi engine          EMBRAER EMB-145XR       2
## 2  N102UW 1998 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 3  N103US 1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 4  N104UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 5  N10575 2002 Fixed wing multi engine          EMBRAER EMB-145LR       2
## 6  N105UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
##   seats speed    engine
## 1    55    NA Turbo-fan
## 2   182    NA Turbo-fan
## 3   182    NA Turbo-fan
## 4   182    NA Turbo-fan
## 5    55    NA Turbo-fan
## 6   182    NA Turbo-fan

weather.df <- dbGetQuery(mydb, "
                          select * from weather")
head(weather.df)

##   origin year month day hour  temp  dewp humid wind_dir wind_speed
## 1    EWR 2013     1   1    0 37.04 21.92 53.97      230   10.35702
## 2    EWR 2013     1   1    1 37.04 21.92 53.97      230   13.80936
## 3    EWR 2013     1   1    2 37.94 21.92 52.09      230   12.65858
## 4    EWR 2013     1   1    3 37.94 23.00 54.51      230   13.80936
## 5    EWR 2013     1   1    4 37.94 24.08 57.04      240   14.96014
## 6    EWR 2013     1   1    6 39.02 26.06 59.37      270   10.35702
##   wind_gust precip pressure visib
## 1  11.91865      0   1013.9    10
## 2  15.89154      0   1013.0    10
## 3  14.56724      0   1012.6    10
## 4  15.89154      0   1012.7    10
## 5  17.21583      0   1012.8    10
## 6  11.91865      0   1012.0    10

Step-4 : Inserting in the same structure format into MongoDB:

mongo_airlines <- mongo(collection = "airlines", db = "flights")
mongo_airlines$insert(airlines.df)

## List of 5
##  $ nInserted  : num 16
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

mongo_airports <- mongo(collection = "airports", db = "flights")
mongo_airports$insert(airports.df)

## List of 5
##  $ nInserted  : num 1397
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

mongo_flights <- mongo(collection = "flights", db = "flights")
mongo_flights$insert(flights.df)

## List of 5
##  $ nInserted  : num 336776
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

mongo_planes <- mongo(collection = "planes", db = "flights")
mongo_planes$insert(planes.df)

## List of 5
##  $ nInserted  : num 3322
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

mongo_weather <- mongo(collection = "weather", db = "flights")
mongo_weather$insert(weather.df)

## List of 5
##  $ nInserted  : num 8719
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

Even though we inserted here in the exact same structure as the MySQL into MongoDB, but MongoDB or any other document database is more effective with saving documents in the form of embedded document structures like JSON.

Comparison of RDMBMS and NoSQL databases:

RDBMS:

Advantages:

Usage of uniform language across all the RDBMSes - SQL

ACID properties - Atomicity, Consistency, Isolation, Durability

Easy to use and maintain

Disadvantages:

Does not work really well and becomes difficult to manage as the DB size grows

Unable to handle unstructured data

NoSQL databases:

Advantages:

Can handle unstructured data very well

Easily scalable as it is schema free

The data is stored in same structure as it is presented

Disadvantages:

ACID properties are given away

data redundancy may happen

Data 607 Week-12 Assignment

Deepak Mongia

November 25, 2018

Ask