For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.

For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.

For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing.

Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.

Extracting Data from MySQL DB into R Dataframe

Install packages for MySQL DB

library(RMySQL)
## Warning: package 'RMySQL' was built under R version 3.3.3
## Loading required package: DBI
## Warning: package 'DBI' was built under R version 3.3.2
#install.packages('rJava')

Connect to MySQL Instance

mydb <- dbConnect(MySQL(), user='root', password='meZt8pkaA', host='localhost')
dbSendQuery(mydb, "USE tb")
## <MySQLResult:341708704,0,0>

Let’s store the table in tb database into a dataframe

tb_df<-dbGetQuery(mydb,"SELECT * FROM tb;")
tb_df_10 <- head(tb_df, 10)
tb_df_10
##        country year    sex child adult elderly
## 1  Afghanistan 1995 female    NA    NA      NA
## 2  Afghanistan 1995   male    NA    NA      NA
## 3  Afghanistan 1996 female    NA    NA      NA
## 4  Afghanistan 1996   male    NA    NA      NA
## 5  Afghanistan 1997 female     5    96       1
## 6  Afghanistan 1997   male     0    26       0
## 7  Afghanistan 1998 female    45  1142      20
## 8  Afghanistan 1998   male    30   500      41
## 9  Afghanistan 1999 female    25   484       8
## 10 Afghanistan 1999   male     8   212       8

Let’s close and disconnect from MySQL database, and also detach the RMySQL package

dbDisconnect(mydb)
## [1] TRUE
detach("package:RMySQL", unload=TRUE)

MONGODB

Start the MongoDB Server on your system by running the below command in the command prompt

“C:Files.4.exe”

Once the MongoDB has started with waiting message proceed to install R packages Install Packagees

library(devtools)
## Warning: package 'devtools' was built under R version 3.3.3
install_github(repo = "mongosoup/rmongodb")
## Skipping install of 'rmongodb' from a github remote, the SHA1 (8eb2bca2) has not changed since last install.
##   Use `force = TRUE` to force installation
library(rJava)
## Warning: package 'rJava' was built under R version 3.3.2
library(RMongo)
## Warning: package 'RMongo' was built under R version 3.3.3
## 
## Attaching package: 'RMongo'
## The following objects are masked from 'package:DBI':
## 
##     dbDisconnect, dbGetQuery
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.3.3
library(stringr)
## Warning: package 'stringr' was built under R version 3.3.3
library(rmongodb)

# connect to MongoDB
mongo = mongo.create(host = "localhost",db ="tb")

#Confirm sucessful connection
mongo.is.connected(mongo)
## [1] TRUE
mongo.get.databases(mongo)
## [1] "tb"

In this section we would try to upload the dataframe into MongoDB. This assumes you have installed the MongoDB

#Convert Dataframe into BSON
tb_list=mongo.bson.from.df(tb_df)

b = mongo.bson.from.list(tb_list)

#insert BSON into MongoDB collection

mongo.insert(mongo,"tb.tb_collection1",b)
## [1] TRUE
#confirm we have data in the collection
mongo.count(mongo, "tb.tb_collection1")
## [1] 2

Query the collection that we just inserted

tmp = mongo.find.one(mongo, ns = "tb.tb_collection1")

Display some Mongo data

tb_list_10=mongo.bson.from.df(tb_df_10)

b_10 = mongo.bson.from.list(tb_list_10)

mongo.insert(mongo,"tb.tb_collection10",b_10)
## [1] TRUE
mongo.find.one(mongo, ns = "tb.tb_collection10")
##  _id : 7      5906884756b76e730ddbff8b
##  1 : 3    
##      country : 2      Afghanistan
##      year : 16    1995
##      sex : 2      female
##      child : 10   BSON_NULL
##      adult : 10   BSON_NULL
##      elderly : 10     BSON_NULL
## 
##  2 : 3    
##      country : 2      Afghanistan
##      year : 16    1995
##      sex : 2      male
##      child : 10   BSON_NULL
##      adult : 10   BSON_NULL
##      elderly : 10     BSON_NULL
## 
##  3 : 3    
##      country : 2      Afghanistan
##      year : 16    1996
##      sex : 2      female
##      child : 10   BSON_NULL
##      adult : 10   BSON_NULL
##      elderly : 10     BSON_NULL
## 
##  4 : 3    
##      country : 2      Afghanistan
##      year : 16    1996
##      sex : 2      male
##      child : 10   BSON_NULL
##      adult : 10   BSON_NULL
##      elderly : 10     BSON_NULL
## 
##  5 : 3    
##      country : 2      Afghanistan
##      year : 16    1997
##      sex : 2      female
##      child : 16   5
##      adult : 16   96
##      elderly : 16     1
## 
##  6 : 3    
##      country : 2      Afghanistan
##      year : 16    1997
##      sex : 2      male
##      child : 16   0
##      adult : 16   26
##      elderly : 16     0
## 
##  7 : 3    
##      country : 2      Afghanistan
##      year : 16    1998
##      sex : 2      female
##      child : 16   45
##      adult : 16   1142
##      elderly : 16     20
## 
##  8 : 3    
##      country : 2      Afghanistan
##      year : 16    1998
##      sex : 2      male
##      child : 16   30
##      adult : 16   500
##      elderly : 16     41
## 
##  9 : 3    
##      country : 2      Afghanistan
##      year : 16    1999
##      sex : 2      female
##      child : 16   25
##      adult : 16   484
##      elderly : 16     8
## 
##  10 : 3   
##      country : 2      Afghanistan
##      year : 16    1999
##      sex : 2      male
##      child : 16   8
##      adult : 16   212
##      elderly : 16     8

Exploratory

Lets confirm that the data can be queried in MongoDB

library(RMongo)
library(knitr)


 
 query = mongo.bson.buffer.create()
mongo.bson.buffer.append(query, 'country', c("Afghanistan","Algeria"))
## [1] TRUE
result =mongo.bson.from.buffer(query)  
result
##  country : 4      
##      0 : 2    Afghanistan
##      1 : 2    Algeria

advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.

Advantages of storing Data in Relational Database 1. Minimize redudancy of data. Data could be stored once and maintained at one place but referenced many other places 2. It prevents data becoming out of sync. Database Atomicity property, you can either have a debit or a credit 3. It helps prevent bad data like a phone number that has alphabets 4. Relational database can perform certain numeric computations at a faster rate than NOSQL 5. Requires less processing effort

Advantages of NOSQL over Relational Database 1. When you are deal with unstructured data like documents, photos, videos 2. It offers much more flexibility in data storage 3. Storage structure is intuitive for example a blog post like with pictures, videos, and likes can be stored in one place 4. NoSQL have a better relationship with object oriented programing langugaes 5. NoSQL makes the most out of cloud computing 6. Rapid development is encouraged in NOSQL shop due to the design or architecture is not needed