For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.
For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.
For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing.
Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
library(RMongo)
library(RMySQL)
library(jsonlite)
library(knitr)
For this project, I choose to use the movie datatables I stored in MYSQL for the week 2 assignment:
#Hide actual connection due to password
#connection <- dbConnect(MySQL(), user='root', password='*******',
#dbname='movies_db', host='127.0.0.1')
#dbListTables(connection)
imdb_info_rs <- dbGetQuery(connection , "select * from imdb_table;")
ratings_rs <- dbGetQuery(connection , "select * from rating_dim;")
genre_rs <- dbGetQuery(connection , "select * from genre_dim;")
dbDisconnect(connection)
## [1] TRUE
connection<-NA
mongo <- mongoDbConnect("Movies_New", "localhost", 27017)
insert_data<-function(connection,db,df){
df_json<-toJSON(df)
#1 clean up
df_json<-sub("[","",df_json, fixed = TRUE)
df_json<-sub("]","",df_json, fixed = TRUE)
df_json<-as.character(df_json)
#2 insert document
dbInsertDocument(connection, db, df_json)
}
#movies, function is called on each row of the dataframe
for(i in 1:nrow(imdb_info_rs)) {
insert_data(mongo,"movie_info_mongo",imdb_info_rs[i,])
}
#ratings, function is called on each row of the dataframe
for(i in 1:nrow(ratings_rs)) {
insert_data(mongo,"ratings",ratings_rs[i,])
}
#genre,function is called on each row of the dataframe
for(i in 1:nrow(genre_rs)) {
insert_data(mongo,"genre",genre_rs[i,])
}
kable(RMongo::dbGetQuery(mongo, "movie_info_mongo",'{"Year": "1994"}'))
| Year | Rank_IMBD | Votes_IMDB | Title | X_id | Title_Generated_Number |
|---|---|---|---|---|---|
| 1994 | 1 | 1071904 | The Shawshank Redemption | 5ae6370497767b38538368ae | 573 |
| 1994 | 4 | 830504 | Pulp Fiction | 5ae6370497767b38538368b1 | 576 |
| 1994 | 18 | 711386 | Forrest Gump | 5ae6370497767b38538368bf | 590 |
| 1994 | 1 | 1071904 | The Shawshank Redemption | 5ae63d2b9776ec8652c7bc18 | 573 |
| 1994 | 4 | 830504 | Pulp Fiction | 5ae63d2b9776ec8652c7bc1b | 576 |
| 1994 | 18 | 711386 | Forrest Gump | 5ae63d2b9776ec8652c7bc29 | 590 |
| 1994 | 1 | 1071904 | The Shawshank Redemption | 5ae63d789776b08760d9f308 | 573 |
| 1994 | 4 | 830504 | Pulp Fiction | 5ae63d789776b08760d9f30b | 576 |
| 1994 | 18 | 711386 | Forrest Gump | 5ae63d789776b08760d9f319 | 590 |
| 1994 | 1 | 1071904 | The Shawshank Redemption | 5ae63d9b97769ab151d7fb35 | 573 |
| 1994 | 4 | 830504 | Pulp Fiction | 5ae63d9b97769ab151d7fb38 | 576 |
| 1994 | 18 | 711386 | Forrest Gump | 5ae63d9b97769ab151d7fb46 | 590 |
| 1994 | 1 | 1071904 | The Shawshank Redemption | 5ae63df39776507c6f907119 | 573 |
| 1994 | 4 | 830504 | Pulp Fiction | 5ae63df39776507c6f90711c | 576 |
| 1994 | 18 | 711386 | Forrest Gump | 5ae63df39776507c6f90712a | 590 |
kable(RMongo::dbGetQuery(mongo, "ratings",'{"Average__Ranking": 16}'))
| Average__Ranking | Liam_Ranking | Heather_Ranking | Mom_Ranking | Meaghan_Ranking | X_id | Title_Generated_Number | Dad_Ranking |
|---|---|---|---|---|---|---|---|
| 16 | 3 | 17 | 22 | 13 | 5ae6370697767b38538368db | 588 | 26 |
| 16 | 14 | 17 | 13 | 15 | 5ae6370697767b38538368dd | 590 | 19 |
| 16 | 24 | 28 | 16 | 9 | 5ae6370697767b38538368de | 591 | 3 |
| 16 | 23 | 5 | 23 | 25 | 5ae6370697767b38538368e0 | 593 | 5 |
| 16 | 3 | 17 | 22 | 13 | 5ae63d2b9776ec8652c7bc45 | 588 | 26 |
| 16 | 14 | 17 | 13 | 15 | 5ae63d2b9776ec8652c7bc47 | 590 | 19 |
| 16 | 24 | 28 | 16 | 9 | 5ae63d2b9776ec8652c7bc48 | 591 | 3 |
| 16 | 23 | 5 | 23 | 25 | 5ae63d2b9776ec8652c7bc4a | 593 | 5 |
| 16 | 3 | 17 | 22 | 13 | 5ae63d789776b08760d9f335 | 588 | 26 |
| 16 | 14 | 17 | 13 | 15 | 5ae63d789776b08760d9f337 | 590 | 19 |
| 16 | 24 | 28 | 16 | 9 | 5ae63d789776b08760d9f338 | 591 | 3 |
| 16 | 23 | 5 | 23 | 25 | 5ae63d789776b08760d9f33a | 593 | 5 |
| 16 | 3 | 17 | 22 | 13 | 5ae63d9b97769ab151d7fb62 | 588 | 26 |
| 16 | 14 | 17 | 13 | 15 | 5ae63d9b97769ab151d7fb64 | 590 | 19 |
| 16 | 24 | 28 | 16 | 9 | 5ae63d9b97769ab151d7fb65 | 591 | 3 |
| 16 | 23 | 5 | 23 | 25 | 5ae63d9b97769ab151d7fb67 | 593 | 5 |
| 16 | 3 | 17 | 22 | 13 | 5ae63df39776507c6f907146 | 588 | 26 |
| 16 | 14 | 17 | 13 | 15 | 5ae63df39776507c6f907148 | 590 | 19 |
| 16 | 24 | 28 | 16 | 9 | 5ae63df39776507c6f907149 | 591 | 3 |
| 16 | 23 | 5 | 23 | 25 | 5ae63df39776507c6f90714b | 593 | 5 |
kable(RMongo::dbGetQuery(mongo, "genre",'{"genre": "Action"}'))
| genre | X_id | Title_Generated_Number |
|---|---|---|
| Action | 5ae6370897767b38538368ee | 577 |
| Action | 5ae6370897767b38538368ef | 578 |
| Action | 5ae6370897767b38538368f2 | 581 |
| Action | 5ae6370897767b38538368f4 | 583 |
| Action | 5ae6370897767b38538368f5 | 584 |
| Action | 5ae6370897767b38538368f8 | 587 |
| Action | 5ae6370897767b38538368f9 | 588 |
| Action | 5ae6370897767b38538368fa | 589 |
| Action | 5ae6370897767b38538368fc | 591 |
| Action | 5ae6370897767b38538368fd | 592 |
| Action | 5ae6370897767b3853836900 | 595 |
| Action | 5ae6370897767b3853836901 | 596 |
| Action | 5ae6370897767b3853836904 | 599 |
| Action | 5ae6370897767b3853836905 | 600 |
| Action | 5ae63d2b9776ec8652c7bc58 | 577 |
| Action | 5ae63d2b9776ec8652c7bc59 | 578 |
| Action | 5ae63d2b9776ec8652c7bc5c | 581 |
| Action | 5ae63d2b9776ec8652c7bc5e | 583 |
| Action | 5ae63d2b9776ec8652c7bc5f | 584 |
| Action | 5ae63d2b9776ec8652c7bc62 | 587 |
| Action | 5ae63d2b9776ec8652c7bc63 | 588 |
| Action | 5ae63d2b9776ec8652c7bc64 | 589 |
| Action | 5ae63d2b9776ec8652c7bc66 | 591 |
| Action | 5ae63d2b9776ec8652c7bc67 | 592 |
| Action | 5ae63d2b9776ec8652c7bc6a | 595 |
| Action | 5ae63d2b9776ec8652c7bc6b | 596 |
| Action | 5ae63d2b9776ec8652c7bc6e | 599 |
| Action | 5ae63d2b9776ec8652c7bc6f | 600 |
| Action | 5ae63d799776b08760d9f348 | 577 |
| Action | 5ae63d799776b08760d9f349 | 578 |
| Action | 5ae63d799776b08760d9f34c | 581 |
| Action | 5ae63d799776b08760d9f34e | 583 |
| Action | 5ae63d799776b08760d9f34f | 584 |
| Action | 5ae63d799776b08760d9f352 | 587 |
| Action | 5ae63d799776b08760d9f353 | 588 |
| Action | 5ae63d799776b08760d9f354 | 589 |
| Action | 5ae63d799776b08760d9f356 | 591 |
| Action | 5ae63d799776b08760d9f357 | 592 |
| Action | 5ae63d799776b08760d9f35a | 595 |
| Action | 5ae63d799776b08760d9f35b | 596 |
| Action | 5ae63d799776b08760d9f35e | 599 |
| Action | 5ae63d799776b08760d9f35f | 600 |
| Action | 5ae63d9b97769ab151d7fb75 | 577 |
| Action | 5ae63d9b97769ab151d7fb76 | 578 |
| Action | 5ae63d9b97769ab151d7fb79 | 581 |
| Action | 5ae63d9b97769ab151d7fb7b | 583 |
| Action | 5ae63d9b97769ab151d7fb7c | 584 |
| Action | 5ae63d9b97769ab151d7fb7f | 587 |
| Action | 5ae63d9b97769ab151d7fb80 | 588 |
| Action | 5ae63d9b97769ab151d7fb81 | 589 |
| Action | 5ae63d9b97769ab151d7fb83 | 591 |
| Action | 5ae63d9b97769ab151d7fb84 | 592 |
| Action | 5ae63d9b97769ab151d7fb87 | 595 |
| Action | 5ae63d9b97769ab151d7fb88 | 596 |
| Action | 5ae63d9b97769ab151d7fb8b | 599 |
| Action | 5ae63d9b97769ab151d7fb8c | 600 |
| Action | 5ae63df39776507c6f907159 | 577 |
| Action | 5ae63df39776507c6f90715a | 578 |
| Action | 5ae63df39776507c6f90715d | 581 |
| Action | 5ae63df39776507c6f90715f | 583 |
| Action | 5ae63df39776507c6f907160 | 584 |
| Action | 5ae63df39776507c6f907163 | 587 |
| Action | 5ae63df39776507c6f907164 | 588 |
| Action | 5ae63df39776507c6f907165 | 589 |
| Action | 5ae63df39776507c6f907167 | 591 |
| Action | 5ae63df39776507c6f907168 | 592 |
| Action | 5ae63df39776507c6f90716b | 595 |
| Action | 5ae63df39776507c6f90716c | 596 |
| Action | 5ae63df39776507c6f90716f | 599 |
| Action | 5ae63df39776507c6f907170 | 600 |
Advantages of NOSQL: no relationships- easier to manage/more flexibility. More scalability Disadvantages: less standardization -> consistency issues
Advantages of SQL: relationships- easier to search and create actionable insights Disadvantages: storage/scalability