Week 13 assignment - NoSQL migration

For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.

For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.

For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing.

Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.

library(RMongo)
library(RMySQL)
library(jsonlite)
library(knitr)

For this project, I choose to use the movie datatables I stored in MYSQL for the week 2 assignment:

Open the MYSQL database connection

#Hide actual connection due to password 

#connection <- dbConnect(MySQL(), user='root', password='*******', 
                 #dbname='movies_db', host='127.0.0.1')

#dbListTables(connection)

Fetch the data with dbGetQuery

imdb_info_rs <- dbGetQuery(connection , "select * from imdb_table;")
ratings_rs <- dbGetQuery(connection , "select * from rating_dim;")
genre_rs <- dbGetQuery(connection , "select * from genre_dim;")

Disconnect from the MYSQL DB

dbDisconnect(connection)
## [1] TRUE
connection<-NA

Connect to the MongoDB

mongo <- mongoDbConnect("Movies_New", "localhost", 27017)  

Create a function for inputting the data into the MongoDB NOSQL environment using the jsonlite library

insert_data<-function(connection,db,df){
  df_json<-toJSON(df)
  
  #1 clean up
  df_json<-sub("[","",df_json, fixed = TRUE)
  df_json<-sub("]","",df_json, fixed = TRUE)
  df_json<-as.character(df_json)
  
  #2 insert document
  dbInsertDocument(connection, db, df_json)
  
}

Call the function on imdb_info_rs, ratings_rs & genre_rs

#movies, function is called on each row of the dataframe
for(i in 1:nrow(imdb_info_rs)) {
  insert_data(mongo,"movie_info_mongo",imdb_info_rs[i,])
}
#ratings, function is called on each row of the dataframe
for(i in 1:nrow(ratings_rs)) {
  insert_data(mongo,"ratings",ratings_rs[i,])
}
#genre,function is called on each row of the dataframe
for(i in 1:nrow(genre_rs)) {
  insert_data(mongo,"genre",genre_rs[i,])
}

Fetch information from the MongoDB

kable(RMongo::dbGetQuery(mongo, "movie_info_mongo",'{"Year": "1994"}')) 
Year Rank_IMBD Votes_IMDB Title X_id Title_Generated_Number
1994 1 1071904 The Shawshank Redemption 5ae6370497767b38538368ae 573
1994 4 830504 Pulp Fiction 5ae6370497767b38538368b1 576
1994 18 711386 Forrest Gump 5ae6370497767b38538368bf 590
1994 1 1071904 The Shawshank Redemption 5ae63d2b9776ec8652c7bc18 573
1994 4 830504 Pulp Fiction 5ae63d2b9776ec8652c7bc1b 576
1994 18 711386 Forrest Gump 5ae63d2b9776ec8652c7bc29 590
1994 1 1071904 The Shawshank Redemption 5ae63d789776b08760d9f308 573
1994 4 830504 Pulp Fiction 5ae63d789776b08760d9f30b 576
1994 18 711386 Forrest Gump 5ae63d789776b08760d9f319 590
1994 1 1071904 The Shawshank Redemption 5ae63d9b97769ab151d7fb35 573
1994 4 830504 Pulp Fiction 5ae63d9b97769ab151d7fb38 576
1994 18 711386 Forrest Gump 5ae63d9b97769ab151d7fb46 590
1994 1 1071904 The Shawshank Redemption 5ae63df39776507c6f907119 573
1994 4 830504 Pulp Fiction 5ae63df39776507c6f90711c 576
1994 18 711386 Forrest Gump 5ae63df39776507c6f90712a 590
kable(RMongo::dbGetQuery(mongo, "ratings",'{"Average__Ranking": 16}')) 
Average__Ranking Liam_Ranking Heather_Ranking Mom_Ranking Meaghan_Ranking X_id Title_Generated_Number Dad_Ranking
16 3 17 22 13 5ae6370697767b38538368db 588 26
16 14 17 13 15 5ae6370697767b38538368dd 590 19
16 24 28 16 9 5ae6370697767b38538368de 591 3
16 23 5 23 25 5ae6370697767b38538368e0 593 5
16 3 17 22 13 5ae63d2b9776ec8652c7bc45 588 26
16 14 17 13 15 5ae63d2b9776ec8652c7bc47 590 19
16 24 28 16 9 5ae63d2b9776ec8652c7bc48 591 3
16 23 5 23 25 5ae63d2b9776ec8652c7bc4a 593 5
16 3 17 22 13 5ae63d789776b08760d9f335 588 26
16 14 17 13 15 5ae63d789776b08760d9f337 590 19
16 24 28 16 9 5ae63d789776b08760d9f338 591 3
16 23 5 23 25 5ae63d789776b08760d9f33a 593 5
16 3 17 22 13 5ae63d9b97769ab151d7fb62 588 26
16 14 17 13 15 5ae63d9b97769ab151d7fb64 590 19
16 24 28 16 9 5ae63d9b97769ab151d7fb65 591 3
16 23 5 23 25 5ae63d9b97769ab151d7fb67 593 5
16 3 17 22 13 5ae63df39776507c6f907146 588 26
16 14 17 13 15 5ae63df39776507c6f907148 590 19
16 24 28 16 9 5ae63df39776507c6f907149 591 3
16 23 5 23 25 5ae63df39776507c6f90714b 593 5
kable(RMongo::dbGetQuery(mongo, "genre",'{"genre": "Action"}')) 
genre X_id Title_Generated_Number
Action 5ae6370897767b38538368ee 577
Action 5ae6370897767b38538368ef 578
Action 5ae6370897767b38538368f2 581
Action 5ae6370897767b38538368f4 583
Action 5ae6370897767b38538368f5 584
Action 5ae6370897767b38538368f8 587
Action 5ae6370897767b38538368f9 588
Action 5ae6370897767b38538368fa 589
Action 5ae6370897767b38538368fc 591
Action 5ae6370897767b38538368fd 592
Action 5ae6370897767b3853836900 595
Action 5ae6370897767b3853836901 596
Action 5ae6370897767b3853836904 599
Action 5ae6370897767b3853836905 600
Action 5ae63d2b9776ec8652c7bc58 577
Action 5ae63d2b9776ec8652c7bc59 578
Action 5ae63d2b9776ec8652c7bc5c 581
Action 5ae63d2b9776ec8652c7bc5e 583
Action 5ae63d2b9776ec8652c7bc5f 584
Action 5ae63d2b9776ec8652c7bc62 587
Action 5ae63d2b9776ec8652c7bc63 588
Action 5ae63d2b9776ec8652c7bc64 589
Action 5ae63d2b9776ec8652c7bc66 591
Action 5ae63d2b9776ec8652c7bc67 592
Action 5ae63d2b9776ec8652c7bc6a 595
Action 5ae63d2b9776ec8652c7bc6b 596
Action 5ae63d2b9776ec8652c7bc6e 599
Action 5ae63d2b9776ec8652c7bc6f 600
Action 5ae63d799776b08760d9f348 577
Action 5ae63d799776b08760d9f349 578
Action 5ae63d799776b08760d9f34c 581
Action 5ae63d799776b08760d9f34e 583
Action 5ae63d799776b08760d9f34f 584
Action 5ae63d799776b08760d9f352 587
Action 5ae63d799776b08760d9f353 588
Action 5ae63d799776b08760d9f354 589
Action 5ae63d799776b08760d9f356 591
Action 5ae63d799776b08760d9f357 592
Action 5ae63d799776b08760d9f35a 595
Action 5ae63d799776b08760d9f35b 596
Action 5ae63d799776b08760d9f35e 599
Action 5ae63d799776b08760d9f35f 600
Action 5ae63d9b97769ab151d7fb75 577
Action 5ae63d9b97769ab151d7fb76 578
Action 5ae63d9b97769ab151d7fb79 581
Action 5ae63d9b97769ab151d7fb7b 583
Action 5ae63d9b97769ab151d7fb7c 584
Action 5ae63d9b97769ab151d7fb7f 587
Action 5ae63d9b97769ab151d7fb80 588
Action 5ae63d9b97769ab151d7fb81 589
Action 5ae63d9b97769ab151d7fb83 591
Action 5ae63d9b97769ab151d7fb84 592
Action 5ae63d9b97769ab151d7fb87 595
Action 5ae63d9b97769ab151d7fb88 596
Action 5ae63d9b97769ab151d7fb8b 599
Action 5ae63d9b97769ab151d7fb8c 600
Action 5ae63df39776507c6f907159 577
Action 5ae63df39776507c6f90715a 578
Action 5ae63df39776507c6f90715d 581
Action 5ae63df39776507c6f90715f 583
Action 5ae63df39776507c6f907160 584
Action 5ae63df39776507c6f907163 587
Action 5ae63df39776507c6f907164 588
Action 5ae63df39776507c6f907165 589
Action 5ae63df39776507c6f907167 591
Action 5ae63df39776507c6f907168 592
Action 5ae63df39776507c6f90716b 595
Action 5ae63df39776507c6f90716c 596
Action 5ae63df39776507c6f90716f 599
Action 5ae63df39776507c6f907170 600

MSQL VS SQL

Advantages of NOSQL: no relationships- easier to manage/more flexibility. More scalability Disadvantages: less standardization -> consistency issues

Advantages of SQL: relationships- easier to search and create actionable insights Disadvantages: storage/scalability